首页> 外文OA文献 >Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization
【2h】

Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

机译:利用细粒度加速GpU上的Lattice QCD多重网格   并行

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The past decade has witnessed a dramatic acceleration of lattice quantumchromodynamics calculations in nuclear and particle physics. This has been dueto both significant progress in accelerating the iterative linear solvers usingmulti-grid algorithms, and due to the throughput improvements brought by GPUs.Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to thelack of parallelism on the coarse grids, and as such, these advances have notproved multiplicative. Using the QUDA library, we demonstrate that by exposingall sources of parallelism that the underlying stencil problem possesses, andthrough appropriate mapping of this parallelism to the GPU architecture, we canachieve high efficiency even for the coarsest of grids. Results are presentedfor the Wilson-Clover discretization, where we demonstrate up to 10x speedupover present state-of-the-art GPU-accelerated methods on Titan. Finally, welook to the future, and consider the software implications of our findings.
机译:过去十年见证了核物理和粒子物理中晶格量子色动力学计算的显着加速。这是由于在使用多网格算法加速迭代线性求解器方面取得了重大进展,也归功于GPU带来的吞吐量提高。由于在粗糙网格上缺乏并行性,因此在GPU上最佳地部署分层算法并非易事。因此,这些进步尚未被证明具有可乘性。使用QUDA库,我们证明了通过公开底层模板问题所具有的所有并行性来源,并通过将该并行性适当映射到GPU架构,即使对于最粗糙的网格,也可以实现高效率。给出了Wilson-Clover离散化的结果,其中我们证明了Titan上现有的GPU加速方法的速度提高了10倍。最后,我们展望未来,并考虑我们发现的软件含义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号